The Architecture Of A Standard Arabic Lexical Database: Some Figures, Ratios And Categories From The DIINAR.1 Source Program

نویسندگان

  • Ramzi Abbes
  • Joseph Dichy
  • Mohamed Hassoun
چکیده

This paper is a contribution to the issue – which has, in the course of the last decade, become critical – of the basic requirements and validation criteria for lexical language resources in Standard Arabic. The work is based on a critical analysis of the architecture of the DIINAR.1 lexical database, the entries of which are associated with grammar-lexis relations operating at word-form level (i.e. in morphological analysis). Investigation shows a crucial difference, in the concept of ‘lexical database’, between source program and generated lexica. The source program underlying DIINAR.1 is analysed, and some figures and ratios are presented. The original categorisations are, in the course of scrutiny, partly revisited. Results and ratios given here for basic entries on the one hand, and for generated lexica of inflected word-forms on the other. They aim at giving a first answer to the question of the ratios between the number of lemma-entries and inflected word-forms that can be expected to be included in, or generated by, a Standard Arabic lexical dB. These ratios can be considered as one overall language-specific criterion for the analysis, evaluation and validation of lexical dB-s in Arabic.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Roots & Patterns vs. Stems plus Grammar-Lexis Specifications: on what basis should a multilingual lexical database centred on Arabic be built?

Machine translation engines draw on various types of databases. This paper is concerned with Arabic as a source or target language, and focuses on lexical databases. The non-concatenative nature of Arabic morphology, the complex structure of Arabic word-forms, and the general use of vowel-free writing present a real challenge to NLP developers. We show here how and why a stem-grounded lexical d...

متن کامل

On lemmatization in Arabic,

This work is a ‘prospective extension’ of the lexical work achieved in the DIINAR-MBC Euro-Mediterranean project. It aims at contributing to the crucial issue in the field of Arabic NLP of the operations involved in lemmatization, which are necessarily based on a definition of the Arabic entries of a monolingual or multilingual lexical database. As shown in previous work, lexical entries can be...

متن کامل

مفهوم معماری در برهۀ گذار از دورۀ ساسانیان به دوران اسلامی، درآمدی بر تاریخ مفهومی معماری ایران

The common division in Iranian architectural history to two separate periods - pre-Islamic and Islamic - which was once efficient, has gradually lost its motivating power. Such a cliché had consequences including disregarding the architecture of the very transitional era from the late Sassanid to the early Islamic period. Understanding the real changes, evolutions, breaks, and continuity...

متن کامل

Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents

Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...

متن کامل

The production of lexical categories (VP) and functional categories (copula) at the initial stage of child L2 acquisition

This is a longitudinal case study of two Farsi-speaking children learning English: ‘Bernard’ and ‘Melissa’, who were 7;4 and 8;4 at the start of data collection. The research deals with the initial state and further development in the child second language (L2) acquisition of syntax regarding the presence or absence of copula as a functional category, as well as the role and degree of L1 influe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004